AITopics | player and arm

Collaborating Authors

player and arm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Explore-then-Commit Algorithms for Decentralized Two-Sided Matching Markets

Pagare, Tejas, Ghosh, Avishek

arXiv.org Machine LearningAug-16-2024

Online learning in a decentralized two-sided matching markets, where the demand-side (players) compete to match with the supply-side (arms), has received substantial interest because it abstracts out the complex interactions in matching platforms (e.g. UpWork, TaskRabbit). However, past works assume that each arm knows their preference ranking over the players (one-sided learning), and each player aim to learn the preference over arms through successive interactions. Moreover, several (impractical) assumptions on the problem are usually made for theoretical tractability such as broadcast player-arm match Liu et al. (2020; 2021); Kong & Li (2023) or serial dictatorship Sankararaman et al. (2021); Basu et al. (2021); Ghosh et al. (2022). In this paper, we study a decentralized two-sided matching market, where we do not assume that the preference ranking over players are known to the arms apriori. Furthermore, we do not have any structural assumptions on the problem. We propose a multi-phase explore-then-commit type algorithm namely epoch-based CA-ETC (collision avoidance explore then commit) (\texttt{CA-ETC} in short) for this problem that does not require any communication across agents (players and arms) and hence decentralized. We show that for the initial epoch length of $T_{\circ}$ and subsequent epoch-lengths of $2^{l/\gamma} T_{\circ}$ (for the $l-$th epoch with $\gamma \in (0,1)$ as an input parameter to the algorithm), \texttt{CA-ETC} yields a player optimal expected regret of $\mathcal{O}\left(T_{\circ} (\frac{K \log T}{T_{\circ} \Delta^2})^{1/\gamma} + T_{\circ} (\frac{T}{T_{\circ}})^\gamma\right)$ for the $i$-th player, where $T$ is the learning horizon, $K$ is the number of arms and $\Delta$ is an appropriately defined problem gap. Furthermore, we propose a blackboard communication based baseline achieving logarithmic regret in $T$.

algorithm, player and arm, preference ranking, (15 more...)

arXiv.org Machine Learning

2408.0869

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Kosovo > District of Gjilan > Kamenica (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry:

Banking & Finance > Trading (0.55)
Education > Educational Setting (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Bandit Learning in Decentralized Matching Markets

Liu, Lydia T., Ruan, Feng, Mania, Horia, Jordan, Michael I.

arXiv.org Machine LearningDec-14-2020

A fundamental question at the intersection of learning theory and game theory is as follows: how should individually rational agents act when they have to learn about the consequences of their actions in the same uncertain environment? The emerging research area of multiplayer learning-- which explores this basic question and is motivated by a broad range of modern applications, from modeling competition among firms [Mansour et al., 2018, Aridor et al., 2019] to implementing protocols for wireless networks [Liu and Zhao, 2010, Cesa-Bianchi et al., 2016, Shahrampour et al., 2017]--has been of increasing interest. A particularly salient application is the online marketplace, which can often be modeled as a two-sided matching market with uncertainty. Examples include online labor markets (Upwork and TaskRabbit for freelancing, Handy for housecleaning), online crowdsourcing platforms (Amazon Mechanical Turk), and peer-to-peer platforms (Airbnb). The multi-armed bandit is a core learning problem in which a player is faced with a choice among K actions, each of which is associated with a reward distribution, and the goal is to learn which action has the highest reward, doing so as quickly as possible so as to be able to reap rewards even while the learning process is underway. To introduce a game-theoretic aspect into the bandit problem, it is natural to place the problem into the context of a two-way matching market, where the choices faced by the players are identified with the entities on the other side of the market, and where the need to realize a matching imposes economic constraints and incentives. Such a blend of bandit learning with two-sided matching markets was introduced by Liu et al. [2020], who formulated a problem in which the players and the arms form the two sides of the market, and each side has preferences over the other side.

artificial intelligence, data mining, machine learning, (20 more...)